Lab 3 - Basic Artificial Neural Network

Here we will build a very rudimentary Artificial Neural Network (ANN) and use it to solve some basic classification problems. This example is implemented using only basic math and linear algebra functions, which will allow us to study how each aspect of the network works, and to gain an intuitive understanding of its functions. In future labs we will use pre-built libraries such as Keras which automate and optimize much of these functions, making the network much faster and easier to use.

The code and MNIST test data is taken directly from http://neuralnetworksanddeeplearning.com/ by Michael Nielsen. Please review the first chapter of the book for a thorough explanation of the code.


In [ ]:
import random
import numpy as np

In [ ]:
class Network(object):
    
    def __init__(self, sizes):
        
        """The list ``sizes`` contains the number of neurons in the
        respective layers of the network.  For example, if the list
        was [2, 3, 1] then it would be a three-layer network, with the
        first layer containing 2 neurons, the second layer 3 neurons,
        and the third layer 1 neuron.  The biases and weights for the
        network are initialized randomly, using a Gaussian
        distribution with mean 0, and variance 1.  Note that the first
        layer is assumed to be an input layer, and by convention we
        won't set any biases for those neurons, since biases are only
        ever used in computing the outputs from later layers."""
        
        self.num_layers = len(sizes)
        self.sizes = sizes
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                       for x, y in zip(sizes[:-1], sizes[1:])]
        
    def feedforward (self, a):
        
        #Return the output of the network if "a" is input.
        
        for b, w in zip(self.biases, self.weights):
            a = sigmoid(np.dot(w, a)+b)
        return a
    
    def SGD(self, training_data, epochs, mini_batch_size, eta,
            test_data=None):
        
        """Train the neural network using mini-batch stochastic
        gradient descent.  The "training_data" is a list of tuples
        "(x, y)" representing the training inputs and the desired
        outputs.  The other non-optional parameters are
        self-explanatory.  If "test_data" is provided then the
        network will be evaluated against the test data after each
        epoch, and partial progress printed out.  This is useful for
        tracking progress, but slows things down substantially."""

        if test_data: n_test = len(test_data)
        n = len(training_data)
        for j in xrange(epochs):
            random.shuffle(training_data)
            mini_batches = [
                training_data[k:k+mini_batch_size]
                for k in xrange(0, n, mini_batch_size)]
            for mini_batch in mini_batches:
                self.update_mini_batch(mini_batch, eta)
            
            if test_data:
                print "Epoch {0}: {1} / {2}".format(
                    j, self.evaluate(test_data), n_test)
            else:
                print "Epoch {0} complete".format(j)
    
    def update_mini_batch(self, mini_batch, eta):
        
        """Update the network's weights and biases by applying
        gradient descent using backpropagation to a single mini batch.
        The "mini_batch" is a list of tuples "(x, y)", and "eta"
        is the learning rate."""

        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        for x, y in mini_batch:
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
        self.weights = [w-(eta/len(mini_batch))*nw 
                        for w, nw in zip(self.weights, nabla_w)]
        self.biases = [b-(eta/len(mini_batch))*nb 
                       for b, nb in zip(self.biases, nabla_b)]
        
    def backprop(self, x, y):
        
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.biases`` and ``self.weights``."""
        
        nabla_b = [np.zeros(b.shape) for b in self.biases]
        nabla_w = [np.zeros(w.shape) for w in self.weights]
        
        # feedforward
        activation = x
        activations = [x] # list to store all the activations, layer by layer
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.biases, self.weights):
            z = np.dot(w, activation)+b
            zs.append(z)
            activation = sigmoid(z)
            activations.append(activation)
        # backward pass
        delta = self.cost_derivative(activations[-1], y) * \
            sigmoid_prime(zs[-1])
        nabla_b[-1] = delta
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())
        
        """Note that the variable l in the loop below is used a little
        differently to the notation in Chapter 2 of the book.  Here,
        l = 1 means the last layer of neurons, l = 2 is the
        second-last layer, and so on.  It's a renumbering of the
        scheme in the book, used here to take advantage of the fact
        that Python can use negative indices in lists."""
        
        for l in xrange(2, self.num_layers):
            z = zs[-l]
            sp = sigmoid_prime(z)
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
        return (nabla_b, nabla_w)

    def evaluate(self, test_data):
        
        """Return the number of test inputs for which the neural
        network outputs the correct result. Note that the neural
        network's output is assumed to be the index of whichever
        neuron in the final layer has the highest activation."""
        
        test_results = [(np.argmax(self.feedforward(x)), y)
                        for (x, y) in test_data]
        return sum(int(x == y) for (x, y) in test_results)

    def cost_derivative(self, output_activations, y):
        """Return the vector of partial derivatives \partial C_x /
        \partial a for the output activations."""
        return (output_activations-y)

In [ ]:
def sigmoid(z):
#     The sigmoid function.
    return 1.0/(1.0 + np.exp(-z))

def sigmoid_prime(z):
#     Derivative of the sigmoid function.
    return sigmoid(z)*(1-sigmoid(z))

Iris dataset example

Now we will test our basic artificial neural network on a very simple classification problem. First we will use the seaborn data visualization library to load the 'iris' dataset, which consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), with four features measuring the length and the width of each flower's sepals and petals. After we load the data we will vizualize it using some functions in seaborn.


In [ ]:
%matplotlib inline
import seaborn as sns; sns.set(style="ticks", color_codes=True)
from sklearn.preprocessing import OneHotEncoder
from sklearn.utils import shuffle

iris_data = sns.load_dataset("iris")

# randomly shuffle data
iris_data = shuffle(iris_data)

# print first 5 data points
print iris_data[:5]

# create pairplot of iris data
g = sns.pairplot(iris_data, hue="species")

Next, we will prepare the data set for training in our ANN. In order to work with our functions, the data needs to be converted to numpy format, split into feature and target sets, and then recombined as separate lists within a single dataset. Finally, we split the data set into training and testing sets, and convert the targets of the training set to 'one-hot' encoding (OHE). OHE takes each piece of categorical data and converts it to a list of binary values the length of which is equal to the number of categories, and the position of the current category denoted with a '1' and '0' for all others. For example, in our dataset we have 3 possible categories: versicolor, virginica, and setosa. After applying OHE, versicolor becomes [1,0,0], virginica becomes [0,1,0], and setosa becomes [0,0,1]. OHE is a standard format for target data as it allows easy application of the cost function during training.


In [ ]:
# convert iris data to numpy format
iris_array = iris_data.as_matrix()

# split data into feature and 
X = iris_array[:, :4].astype(float)
y = iris_array[:, -1]

_, y = np.unique(y, return_inverse=True)
y = y.reshape(-1,1)

# create on-hot encoding function
enc = OneHotEncoder()
enc.fit(y)

# combine feature and target data
data = []
for i in range(X.shape[0]):
    data.append(tuple([X[i].reshape(-1,1), y[i][0]]))

# split data into training and test sets
trainingSplit = int(.8 * len(data))
training_data = data[:trainingSplit]
test_data = data[trainingSplit:]

# convert training targets to on-hot encoding
training_data = [[_x, enc.transform(_y.reshape(-1,1)).toarray().reshape(-1,1)] for _x, _y in training_data]

In [ ]:
net = Network([4, 25, 3])
net.SGD(training_data, 21, 10, .1, test_data=test_data)

MNIST dataset example

Next, we will test our ANN on another, slightly more difficult classification problem. The data set we'll be using is called MNIST, which contains tens of thousands of scanned images of handwritten digits, classified according to the digit type from 0-9. The name MNIST comes from the fact that it is a Modified (M) version of a dataset originally developed by the United States' National Institute of Standards and Technology (NIST). This is a very popular dataset used to measure the effectiveness of Machine Learning models for image recongnition. This time we don't have to do as much data management since the data is already provided in the right format here.

We will get into more details about working with images and proper data formats for image data in later labs, but you can already use this data to test the effectiveness of our network. With the default settings you should be able to get a classification accuracy of 95% in the test set.


In [3]:
import mnist_loader

training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

We can use the matplotlib library to visualize one of the training images. In the data set, the pixel values of each 28x28 pixel image is encoded in a straight list of 784 numbers, so before we visualize it we have to use numpy's reshape function to convert it back t


In [9]:
%matplotlib inline
import matplotlib.pylab as plt

img = training_data[0][0][:,0].reshape((28,28))

fig = plt.figure()
plt.imshow(img, interpolation='nearest', vmin = 0, vmax = 1, cmap=plt.cm.gray)
plt.axis('off')
plt.show()



In [ ]:
net = Network([784, 30, 10])
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)